Apparatus and method for speech recognition

ABSTRACT

An apparatus and a method for speech recognition are provided, by which, whereby the speech is optionally input via a microphone ( 14 ) close to the speaker or a microphone ( 20 ) remote from the speaker. A correction unit ( 15 ) is connected into the transmission channel ( 12 ) with microphone ( 14 ) close to the speaker, the correction unit modifying the electrical speech signal in such a way that it contains room transmission characteristics.

The invention relates to an apparatus for speech recognition in whichthe speech is optionally converted into electrical signals via amicrophone close to the speaker and is supplied to a recognition systemvia a first transmission channel, or is converted into electricalsignals via a microphone remote from the speaker and is supplied to therecognition system via a second transmission channel, and in which therecognition system compares the speech elements recorded using therespective microphone with speech elements learned previously in atraining phase, and, in case of agreement, produces a recognitionsignal. In addition, the invention relates to a method for speechrecognition.

DESCRIPTION OF THE RELATED ART

In the recognition of speech or of speech elements, there is often thedifficulty that the speech elements input via a microphone are affectedby and overlaid with variance in room acoustics. The transmissioncharacteristics of the room/space can significantly influence therecognition rate of the recognition system. Previously realizedapparatuses and methods for speech recognition do not take into accountchanges in the transmission function of the room. In general, in theprevious apparatuses and methods it has been assumed that thetransmission function in the transmission of the speech of a personremains the same up to the digital recording, both in the training phaseand also in later use for speech recognition, in particularly in thecase of speaker-dependent speech recognition.

However, in speech recognition via e.g., a telephone, such an assumptionis not made, because telephone systems currently in use have thepossibility of switching between a telephone close to the speaker, inwhich the microphone of the telephone handset is held close to the mouthof the speaker, and a microphone remote from the speaker, in which (in ahands-free state, the microphone records voices at a greater distance.The typical distance for a microphone close to the speaker is in therange from 0 to 30 cm, that is, predominantly direct sound is convertedinto electrical signals. For microphone remote from the speaker, thedistance is greater, and direct sound elements are mixed togetherresulting from echo effects, wall reflections, and direct sound. If themicrophone close to the speaker is used during the training phase and amicrophone remote from the speaker is used later, the recognition rateis deceased due to the different room transmission functions, as aresult of the different transmission paths.

SUMMARY OF THE INVENTION

The object of the invention is to indicate an apparatus and a method forspeech recognition that operates with high reliability, independent onthe speaker's distance from a microphone.

This object is achieved by an apparatus for speech recognition,comprising a microphone close to a speaker or a microphone remote fromthe speaker, which produces electrical signals from speech elements ofthe speaker; a recognition system to which the electrical signals aresupplied, the electrical signals being supplied via a first transmissionchannel when the microphone is a microphone close to the speaker, andthe electrical signals being supplied via a second transmission channelwhen the microphone is a microphone remote from the speaker, therecognition system comparing speech elements recorded by the microphonewith speech elements learned previously in a training phase, and, incase of agreement, producing a recognition signal; a correction unitconnected into the first transmission channel, the correction unitmodifying the electrical signals in such a way that they have roomtransmission characteristics as they occur in recording with amicrophone remote from the speaker. The correction unit can beconfigured to simulate acoustic reflections from nearby objects and/orroom reverberation. The correction unit may be fashioned as a stationaryfilter or an adaptive filter, and the adaptive filter's parameters canbe set depending on recorded audio signals. Each microphone may alsoattach to a preamplifier. Compensation filters may also be provided forthe compensation of varying microphone and amplifier frequency responsecharacteristics. The recognition system may use a spectral analysis oran LPC ceptral analysis as its method.

The object of the invention is also achieved by a method for speechrecognition, comprising the steps of: converging speech elements of aspeaker into electrical signals using a microphone close to the speakeror a microphone remote from the speaker; supplying the electricalsignals from the microphone, when the microphone is a microphone closeto the speaker, to a recognition system via a first transmissionchannel; supplying the electrical signals from the microphone, when themicrophone is a microphone remote from the speaker, to the recognitionsystem via a second transmission channel; recording speech elements in atraining phase; recording speech elements with the microphone in anoperating phase; comparing the recorded speech elements in the trainingphase with the recorded speech elements in the operating phase in therecognition system and, in case of agreement, producing a recognitionsignal; modifying the electrical signals from the first transmissionchannel in such a way that they have room transmission characteristicsas they occur during recording with the microphone remote from thespeaker. The correction unit can simulate acoustic reflections fromnearby objects and/or room reverberations.

According to the invention, a correction unit is connected into thefirst transmission channel that modifies the electrical signal in such away that it contains room transmission characteristics. Thus, the speechinput via a microphone close to the speaker is modified in theelectrical signal in such a way that it has the characteristics ofspeech that has been input via the microphone remote from the speaker.Thus, the correction unit is used to simulate the room acousticinfluences for a relatively large speech transmission path. Thecorrection unit stimulates, for example acoustic reflections from nearbyobjects and/or room reverberation.

BRIEF DESCRIPTION OF THE DRAWINGS

An exemplary embodiment of the invention is explained in the followingon the basis of the drawings.

FIG. 1 is a schematic diagram showing an apparatus for speechrecognition in which the speech input via a telephone, and

FIG. 2 is a schematic diagram showing an apparatus according to FIG. 1having adaptive filters.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

FIG. 1 shows an apparatus for speech recognition in which the speech isinputted by a person 10 using a telephone. In the upper, firsttransmission channel 12, the speech is input using a microphone 14 closeto the speaker, for example with the handset. The speech is convertedinto an electrical signal by the microphone 14 and is pre-amplified byan amplifier 16. A correction unit 15 modifies the electrical signal insuch a way that it has transmission characteristics of a room with atransmission path greater than close range. This correction unit 15, forexample simulates room reverberation and/or sound reflections fromnearby objects within the speech transmission path. Acoustic reflectionsof this sort can for example, originate from a desktop, a displayscreen, or from other objects. In contrast, room reverberationoriginates from relatively distant objects, such as for example, fromthe walls of the room. The electrical signal modified by the correctionunit 15 runs through a compensation filter 18 that is used for thecompensation of varying microphone and amplifier frequency responsecharacteristics. The electrical signal is then supplied to a speechrecognition unit 17, which carries out the further digital processingfor the speech recognition.

In the lower part of FIG. 1, the inputting of speech elements via ahands-free apparatus is shown. The speech of the person 10 is modifiedby a special room transmission function RUF, i.e., the speech elementsaccording to the microphone 20 from the speaker 10 are for exampleoverlaid with acoustic reflections from nearby objects and with roomreverberation, and possible, with foreign noises. The electrical signalof the microphone 20 remote from the speaker is pre-amplified by apre-amplifier 22, and is supplied to a compensation filter 24 for thecompensation of varying microphone and amplifier frequency responsecharacteristics. The electrical signal filtered in this way is suppliedto the speech recognition unit 17 for speech recognition.

In operation of the apparatus shown in FIG. 1, during a training speechsamples are stored in the data processing device 17. Which could beused, for example, to construct a personal telephone directory. For thispurpose, during the training phase, the name of a subscriber is spokenat least twice and is stored in a personal telephone directory with thetelephone number associated with the name. After the end of the trainingphase, in the use/operating phase the name is once again input, by whichthe data processing device 17 tries, using recognition methods such asspectral analysis or LPC ceptral analysis, to recognize this name againon the basis of the previously stored name. In the case of a positiveresult, outputs the telephone number stored under this name and sets upthe telephone connection. After the correction unit 15 produces, in thetransmission channel 12, an electrical speech signal having the sameroom characteristics as the speech signal of the second transmissionchannel 19, it is irrelevant for the speech recognition whether themicrophone 14 or, microphone is used during the training phase or duringthe re-recognition phase. Thus, using the correction unit 15, it ispossible to use the telephone both with the handset and also inhands-free operation.

FIG. 2 shows a variant of the apparatus according to FIG. 1. In contrastto the apparatus according to FIG. 1, the correction unit 15 isfashioned as an adaptive filter, that is, the filter parameters arevaried in depending on the recorded audio signals. In this way therecognition rate can be increased. The compensation filters 18 or,respectively, 24 in the two respective transmission channels 19 are alsofashioned as adaptive filters; their filter parameters are set dependenton the recorded audio signals.

1. An apparatus for speech recognition, comprising: a microphoneselected from a group consisting of a microphone close to a speaker anda microphone remote from said speaker, said microphone producingelectrical signals from speech elements of said speaker; a recognitionsystem to which said electrical signals are supplied, said electricalsignals being supplied via a first transmission channel when saidmicrophone is said microphone close to said speaker, and said electricalsignals being supplied via a second transmission channel when saidmicrophone in said microphone remote from said speaker, said recognitionsystem comparing speech elements recorded by said microphone with speechelements learned previously in a training phase and, in case ofagreement, producing a recognition signal; and a correction unitconnected into said first transmission channel, said correction unitmodifying said electrical signals such that said electrical signals haveroom transmission characteristics substantially as they occur inrecording with said microphone remote from said speaker.
 2. Theapparatus according to claim 1, wherein said correction unit simulatesacoustic reflections from nearby objects.
 3. The apparatus according toclaim 1 wherein said correction unit simulates room reverberation. 4.The apparatus according to claim 1, wherein said correction unit isfashioned as a filter selected from the group consisting of a stationaryfilter and an adaptive filter.
 5. The apparatus according to claim 4,wherein said filter is an adaptive filter whose filter parameters areset in dependence on recorded audio signals.
 6. The apparatus accordingto claim 1, further comprising: a preamplifier for said microphone insaid first transmission channel; and a preamplifier for said microphonein said second transmission channel, when said second transmissionchannel is present.
 7. The apparatus according to claim 1, furthercomprising: a compensation filter in said first transmission channel;and a compensation filter in said second transmission channel, when saidsecond transmission channel is present; said compensation filters beingprovided for compensation of varying microphone and amplifier frequencyresponse characteristics.
 8. The apparatus according to claim 1, whereinsaid recognition system uses a speech recognition method selected fromthe group consisting of spectral analysis and LPC cepstral analysis. 9.A method for speech recognition, comprising the steps of: convertingspeech elements of a speaker into electrical signals using a microphoneselected from the group consisting of a microphone close to saidspeaker, and a microphone remote from said speaker; supplying saidelectrical signals from said microphone, when said microphone is amicrophone close to said speaker, to a recognition system via a firsttransmission channel; supplying said electrical signals from saidmicrophone, when said microphone is a microphone remote from saidspeaker, to said recognition system via a second transmission channel;recording speech elements in a training phase; recording speech elementswith said microphone in an operating phase; comparing said recordedspeech elements in said training phase with said recorded speechelements in said operating phase in said recognition system and, in caseof agreement, producing a recognition signal; and modifying saidelectrical signals from said first transmission channel such that saidelectrical signals have room transmission characteristics substantiallyas they occur during recording with said microphone remote from saidspeaker.
 10. The method according to claim 9, further comprising thestep of simulating acoustic reflections from nearby objects with saidcorrection unit.
 11. The method according to claim 9, further comprisingthe step of simulating room reverberation with said correction unit.